Replication file for "Which Local Governments Adopt New Technology First? Agency Size and Bureaucratic Champions for Open Transit Data" by Ishana Ratan, Alison E. Post, Tanu Kumar, Mridang Sheth
January 20, 2025

This replication file contains the following: 

1. R/Python scripts: 

gtfs_data_clean_r&r.R: R file with data cleaning script
acs_buffers: Folder with Python script and shapefiles to calculate demographic characteristics from GTFS buffers
gtfs_analsysis_r&r.R: R file containing analysis code and figures for the study

2. Raw Data:
    a. Original Data
match_data.csv:  longitudinal webscraped GTFS utilization data (2013 - 2021)
sources.tab: Data on GTFS-RT adoption sourced from Mobility Data (https://mobilitydata.org/)
gtfs_census_merged.xlsx: Agency level demographic characteristics calculated from the American Communites Survey and GTFS route buffer overlays in ArcGIS Pro (Python Script used to create file saved in acs_buffers)
gtfs_census_merged_min.xlsx: Minimum value imputation robustness check of agency level demographic data
gtfs_census_merged_max.xlsx: Maximum value imputation robustness check of agency level demographic data
2.24TransportationSurvey_October 30, 2023_12.53. xlsx: Raw survey responses
transit_survey.tab: Cleaned list of agencies surveyed with NTD ID

    b. Agency + Geographic Variables: 

i. Principal City Designation:

list2_2013.xls: List of principal cities in 2013 (https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/historical-delineation-files.html)
list2_2015.xls: List of principal cities in 2015 (https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/historical-delineation-files.html)
list2_Sep_2018.xls: List of principal cities in 2018 (https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/historical-delineation-files.html)
list2_2020.xls: List of principal cities in 2020 (https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/historical-delineation-files.html)
counties.xlsx: List of cities by county in California (https://www.downloadexcelfiles.com/us_en/download-excel-file-list-cities-california-state)

ii. Agency Information

2013 Agency Information_0.xlsx: National Transit Database (NTD) agency characteristics (population, service area, organization type) (2013) (https://www.transit.dot.gov/ntd/ntd-data?field_data_categories_target_id%5B2511%5D=2511&field_product_type_target_id=All&year=all&combine=)
2016 Agency Information.xlsx: NTD agency characteristics (2016) (https://www.transit.dot.gov/ntd/ntd-data?field_data_categories_target_id%5B2511%5D=2511&field_product_type_target_id=All&year=all&combine=)
2018 Agency Info.xlsx: NTD agency characteristics (2018) (https://www.transit.dot.gov/ntd/ntd-data?field_data_categories_target_id%5B2511%5D=2511&field_product_type_target_id=All&year=all&combine=)
2018 Agency Mode TOS.xlsx: NTD modes of service operated by each transit agency (2018) (https://www.transit.dot.gov/ntd/ntd-data?field_data_categories_target_id%5B2511%5D=2511&field_product_type_target_id=All&year=all&combine=)
2021 Agency Information.xlsx: NTD agency characteristics (2021) (https://www.transit.dot.gov/ntd/ntd-data?field_data_categories_target_id%5B2511%5D=2511&field_product_type_target_id=All&year=all&combine=)
UPT.xlsx: NTD monthly unlinked passenger trips (2002 - 2021) (https://www.transit.dot.gov/ntd/data-product/monthly-module-adjusted-data-release)
2023 TS1.1 Total Funding Time Series.xlsx: NTD annual operating expenses (1991 - 2023) (https://www.transit.dot.gov/ntd/data-product/ts11-total-funding-time-series-2)

3. Cleaned data: 
gtfs.xlsx: Dataframe with all GTFS adopters and updates (2013-2021) for descriptive figures
gtfsx.xlsx: Dataframe for survival analysis (2013-2021) of GTFS adoption
gtfs19.xlsx: Dataframe for longitudinal analysis (2019-2021) of GTFS utilization
gtfsmax.xlsx: Dataframe for longitudinal analysis(2019-2021) with minimum ACS values imputed (same variable names as gtfs19)
gtfsmin.xlsx: Dataframe for longitudinal analysis(2019-2021)with maximum ACS values imputed (same variable names as gtfs19)
riders.xlsx: Dataframe for longitudinal analysis of GTFS adoption and ridership (2013-2021)
rtpd.xlsx: Dataframe of survey respondents adopting real time public transportation data and rationale for adoption

Variable descriptions for gtfsx.xlsx are below: 

1) AgencyName: Name of a given agency as listed by the NTD
2) NTDID: National Transit Database Identification Number
3) Legacy NTD ID: Old NTD ID, used to match with 2013 agency data
4) indicator: Binary indicator of GTFS adoption (1) or non-adoption (0)
5) City: City in which agency is headquartered
6) orgtype: Organization type as classified by NTD 
7) pop: Total population in service area (NTD 2018)
8) reptype: NTD reporter type classification
9) VOMS: NTD reported vehicles operated in maximum service
10) mile: Total service area square miles (NTD 2018)
11) poplog: Log of service population 
12) milelog: Log of service area square miles 
13) PC13: Principal City Status as of 2013
14) PC15: Principal City Status as of 2015
15) PC18: Principal City Status as of 2018
16) PC20: Principal City Status as of 2020
17) time: Time in years from the start of analysis (2013) to GTFS adoption, non adopters with most recent date
18) poplog13: Logged agency population in 2013	
19) milelog13: Logged agency service area in 2013	
20) pop13: Agency population in 2013	
21) mile13: Agency service area in 2013
22) poplog16: Logged agency population in 2016
23) milelog16: Logged agency service area in 2016
24) poplog21: Logged agency population in 2021	
25) milelog21: Logged agency service area in 2021	
26) revenue13: Agency revenue in 2013	
27) revlog13: Logged agency revenue in 2013
28) revenue16: Agency revenue in 2016	
29) revlog16: Logged agency revenue in 2016
30) revenue21: Agency revenue in 2021	
31) revlog21: Logged agency revenue in 2021
32) gtfsrt: Binary indicator of GTFS-realtime adoption (1) or non-adoption (0)


Variable descriptions for gtfs19.xlsx are below: 
1) NTDID: National Transit Database Identification Number
2) updates: Monthly updates for each agency
3) routes: Number of routes an agency serves
4) routes/updates: Number of routes divided by number of updates in a given month
5) yrmo: Year-month date
6) updates_sum: Cumulative number of updates per agency (2013 - 2020)
7) year: year date
8) adopt: binary longitudinal indicator of adoption (1) or non-adoption (0)
9) revenue: Total operating revenue (2018), no rural reporters
10) orgtype: Organization type as classified by NTD 2018
11) pop: Total population in service area (NTD 2018)
12) reptype: NTD reporter type classification
13) VOMS: NTD reported vehicles operated in maximum service
14) mile: Total service area square miles (NTD 2018)
15) AgencyName: Name of a given agency as listed by the NTD
16) Legacy NTD ID: Old NTD ID
17) poplog: Log of service population 
18) milelog: Log of service area square miles 
19) PC13: Principal City Status as of 2013
20) PC15: Principal City Status as of 2015
21) PC18: Principal City Status as of 2018
22) PC20: Principal City Status as of 2020
23) blocks: Number of census block groups within each agency GTFS route buffer
24) acs_pop: Total population within each agency GTFS route buffer
25) age: Median age, weighted by population, within agency GTFS route buffer
26) internet: Number of people with internet subscription within agency GTFS route buffer
27) income: Median household income of agency, sum of block group income weighted by block group population  
28) white: Number of white people within each agency GTFS route buffer
29) black: Number of Black people within each agency GTFS route buffer
30) employed: Number of employed people within each agency GTFS route buffer
31) asian: Number of Asian people within each agency GTFS route buffer
32) pc_nonwhite: Percentage of agency population that is non-white (1 - white divided by acs_pop)
33) pc_employed: Percentage of agency population that is employed (employed divided by population)
34) logHHI: Log of median household income
35) pc_internet: Percentage of agency population with internet (internet subscription divided by population)

Variable descriptions for riders.xlsx are below: 
1) NTDID: National Transit Database Identification Number
2) yrmo: Year-month date
3) updates: Number of GTFS updates in a month
4) routes: Number of routes an agency services
5) routes/updates: Number of routes divided by number of updates in a month
6) updates_sum: Cumulative number of updates per agency (2013 - 2020)
7) year: year date
8) adopt: binary longitudinal indicator of adoption (1) or non-adoption (0)
9) revenue: Total operating revenue (2018)
10) orgtype: Organization type as classified by NTD 2018
11) pop: Total population in service area (NTD 2018)
12) reptype: NTD reporter type classification
13) VOMS: NTD reported vehicles operated in maximum service
14) mile: Total service area square miles (NTD 2018)
15) AgencyName: Name of a given agency as listed by the NTD
16) poplog: Log of service population (2018)
17) milelog: Log of service area square miles (2018)
18) PC13: Principal City Status as of 2013
19) PC15: Principal City Status as of 2015
20) PC18: Principal City Status as of 2018
21) PC20: Principal City Status as of 2020
22) UPT: Total number of unlinked passenger trips for all transportation modes
23) lagUPT: One month lag of unlinked passenger trips
24) pct_change: Percent change in monthly lag of unlinked passenger trips
25) abs_pctchange: Absolute value of percent change in monthly lag of unlinked passenger trips
26) logpass: Log of the lagged raw number of unlinked passenger trips
27) lagadopt: One year lag of GTFS adoption
